SARIMAX (Seasonal ARIMA with Exogenous Variables)#

SARIMAX extends SARIMA by adding a regression on external variables (a.k.a. exogenous regressors).

This notebook focuses on:

  • how exogenous variables enter the model

  • causal vs correlational interpretation (and what “counterfactual” means here)

  • practical requirements (alignment, scaling, forecast-time availability)

  • an end-to-end demo with NumPy + statsmodels + Plotly

Learning goals#

By the end you should be able to:

  • write SARIMAX in compact operator notation

  • explain how exog affects the conditional mean and how SARIMA handles residual autocorrelation

  • list practical requirements for exogenous variables (alignment, leakage, availability)

  • visualize the impact of exogenous variables and run scenario / counterfactual-style forecasts

  • implement a simple NumPy two-stage approximation (OLS + seasonal AR errors)

1) Model formulation (regression + SARIMA errors)#

Let \(y_t\) be the target series and \(\mathbf{x}_t \in \mathbb{R}^k\) be the vector of exogenous variables available at time \(t\).

Regression (mean) part

\[ y_t = \beta_0 + \mathbf{x}_t^\top \boldsymbol{\beta} + u_t \]

where \(u_t\) is an autocorrelated error.

SARIMA error part

Let \(B\) be the backshift operator \(B y_t = y_{t-1}\). Define:

\[ \phi(B) = 1 - \phi_1 B - \cdots - \phi_p B^p,\qquad \theta(B) = 1 + \theta_1 B + \cdots + \theta_q B^q \]
\[ \Phi(B^s) = 1 - \Phi_1 B^s - \cdots - \Phi_P B^{Ps},\qquad \Theta(B^s) = 1 + \Theta_1 B^s + \cdots + \Theta_Q B^{Qs} \]

Then the seasonal ARIMA(\(p,d,q\))×(\(P,D,Q\))\(_s\) model for \(u_t\) is:

\[ \phi(B)\,\Phi(B^s)\,(1-B)^d\,(1-B^s)^D\,u_t = \theta(B)\,\Theta(B^s)\,\varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,\sigma^2) \]

Putting both together yields SARIMAX.

Equivalently, you can write the combined model as:

\[ \phi(B)\,\Phi(B^s)\,(1-B)^d\,(1-B^s)^D\,\big(y_t - \beta_0 - \mathbf{x}_t^\top\boldsymbol{\beta}\big) = \theta(B)\,\Theta(B^s)\,\varepsilon_t \]

Key intuition

  • \(\beta_0 + \mathbf{x}_t^\top\boldsymbol{\beta}\) explains the part of \(y_t\) driven by observed external signals.

  • SARIMA explains the remaining autocorrelation in the residual \(u_t\).

2) How exogenous variables are incorporated (precisely)#

In SARIMAX, exogenous variables enter the observation equation as a linear regression.

In statsmodels, exog is a matrix \(X\) whose \(t\)-th row is \(\mathbf{x}_t^\top\). The coefficients \(\boldsymbol{\beta}\) are estimated (typically) jointly with ARIMA parameters via maximum likelihood using a state-space model and the Kalman filter.

Practical patterns:

  • Contemporaneous effects: include \(x_t\).

  • Lagged effects: include \(x_{t-1}, x_{t-2}, \dots\) explicitly as extra columns.

  • Event indicators / interventions: use 0/1 dummies (policy change, promotions, outages).

  • Deterministic seasonality/trend: can be included either via SARIMA terms or as regressors (seasonal dummies, Fourier terms), depending on preference.

Interpretation of \(\beta_j\): holding the ARIMA error structure fixed, increasing \(x_{t,j}\) by 1 changes the model’s conditional mean for \(y_t\) by \(\beta_j\) units (in the scale of your transformed variables).

Econometric exogeneity (strong form) is often written as \(\mathbb{E}[\varepsilon_t \mid X_{1:T}] = 0\); if \(\mathbf{x}_t\) is correlated with the innovation \(\varepsilon_t\) (endogeneity), \(\hat{\boldsymbol{\beta}}\) is biased.

3) Causal vs correlational usage (and real-world examples)#

Predictive / correlational (the default)#

In most forecasting settings, SARIMAX is used to estimate conditional associations:

\[ \beta_j \approx \text{partial association between } y_t \text{ and } x_{t,j} \text{ after accounting for autocorrelation.} \]

This is often all you need for accurate forecasts.

When can coefficients be interpreted causally?#

A causal interpretation (“if we intervene on \(x\), \(y\) will change”) requires assumptions beyond SARIMAX:

  • No reverse causality: \(y\) does not influence \(x\) (or you model that feedback explicitly).

  • No omitted confounders: nothing unobserved drives both \(x\) and \(y\).

  • Correct timing: \(\mathbf{x}_t\) must be known at/ before time \(t\) (no look-ahead).

  • Structural stability: the relationship is stable under the intervention you care about.

Without these, a SARIMAX “counterfactual” is a model-based scenario, not a causal estimate.

Examples#

  • Finance (predictive): forecasting volatility/returns using exogenous signals such as VIX, macro announcements, funding rates, or regime indicators. Be cautious: market variables frequently have feedback loops → endogeneity.

  • Economics (scenario analysis): forecasting inflation with oil prices and policy rates; demand with prices/promotions; unemployment with job openings. For policy effects, identification strategies (IV, DiD, SVAR, etc.) are typically needed.

4) Requirements for exogenous variables (alignment, scaling, leakage)#

Minimal practical requirements (for forecasting):

  • Same time granularity as \(y_t\) (or resampled/aggregated correctly).

  • Correct time alignment: the row \(\mathbf{x}_t\) must only use information available at time \(t\) (or earlier).

  • No unexpected missingness; handle gaps explicitly (impute, drop, or model).

  • Reasonable collinearity (high collinearity makes \(\boldsymbol{\beta}\) unstable).

  • For future forecasting, \(\mathbf{x}_{t+h}\) must be known or scenarized for the horizon.

Alignment checklist (avoid leakage):

  • Join \(y\) and \(X\) on the same timestamp index; confirm lengths after cleaning.

  • If a regressor is published with delay (common in macro data), shift it so \(\mathbf{x}_t\) reflects what was known at time \(t\).

  • For lagged effects, create explicit lag columns (e.g., x.shift(1)); avoid shift(-1) features unless you truly know the future.

  • Decide how to handle missing timestamps (drop vs forward-fill) based on the meaning of the variable.

Scaling / transforms:

  • Standardize continuous regressors using training mean/std and apply the same transform to future exog.

  • Keep binary indicators (0/1 dummies) unscaled in most cases.

  • Scaling \(x\) does not change forecasts if you transform coefficients consistently, but it can improve numerical stability of MLE.

  • Interpret \(\boldsymbol{\beta}\) in the scale of your transformed variables (log, diff, percent).

Lag specification:

  • If effects are delayed, include lags explicitly; SARIMA terms capture autocorrelation, not causal delay in \(x \to y\).

Non-stationary series:

  • Differencing is applied to the ARIMA error part; if \(y\) and \(x\) are trending, you may need trends, differences, or cointegration-aware modeling to avoid spurious relationships.

import warnings

import numpy as np
import pandas as pd

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import os
import plotly.io as pio

import statsmodels.api as sm

warnings.filterwarnings("ignore", category=UserWarning, module="joblib")

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

SEED = 7
rng = np.random.default_rng(SEED)

import numpy, pandas, statsmodels, plotly
print("numpy", numpy.__version__)
print("pandas", pandas.__version__)
print("statsmodels", statsmodels.__version__)
print("plotly", plotly.__version__)
numpy 1.26.2
pandas 2.1.3
statsmodels 0.14.4
plotly 6.5.2
# --- Synthetic data with known exogenous effects ---
n = 180
s = 12  # monthly seasonality
idx = pd.date_range("2010-01-01", periods=n, freq="MS")

# Exogenous variables (think: policy rate + a temporary intervention)
rate = np.zeros(n)
rate[0] = 2.0
for t in range(1, n):
    rate[t] = 0.9 * rate[t - 1] + 0.2 + rng.normal(0, 0.15)

stimulus = np.zeros(n)
stimulus[60:75] = 1.0
stimulus[120:132] = 1.0

# Seasonal AR errors: u_t = phi u_{t-1} + Phi u_{t-s} + eps_t
phi = 0.5
Phi = 0.3
sigma = 1.5

u = np.zeros(n)
eps = rng.normal(0, sigma, size=n)
for t in range(n):
    ar1 = phi * u[t - 1] if t - 1 >= 0 else 0.0
    sar1 = Phi * u[t - s] if t - s >= 0 else 0.0
    u[t] = ar1 + sar1 + eps[t]

# True regression effects
beta0_true = 50.0
beta_rate_true = -4.0
beta_stimulus_true = 8.0

y = beta0_true + beta_rate_true * rate + beta_stimulus_true * stimulus + u

df = pd.DataFrame({"y": y, "rate": rate, "stimulus": stimulus}, index=idx)
df.head()
y rate stimulus
2010-01-01 41.949314 2.000000 0.0
2010-02-01 41.312201 2.000185 0.0
2010-03-01 40.714617 2.044978 0.0
2010-04-01 42.394950 1.999359 0.0
2010-05-01 42.280054 1.865835 0.0
# Plot the target and exogenous drivers
fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Scatter(x=df.index, y=df["y"], name="y (target)", line=dict(color="black")),
    secondary_y=False,
)
fig.add_trace(
    go.Scatter(x=df.index, y=df["rate"], name="rate (exog)", line=dict(color="#4E79A7")),
    secondary_y=True,
)
fig.add_trace(
    go.Bar(x=df.index, y=df["stimulus"], name="stimulus (exog)", opacity=0.25, marker_color="#E15759"),
    secondary_y=True,
)

fig.update_layout(
    title="Synthetic series with exogenous drivers",
    xaxis_title="Date",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
)
fig.update_yaxes(title_text="y", secondary_y=False)
fig.update_yaxes(title_text="rate / stimulus", secondary_y=True)
fig.show()
# --- Fit SARIMAX (with exog) vs SARIMA (no exog) on a holdout split ---
h = 24
train = df.iloc[:-h]
test = df.iloc[-h:]

exog_cols = ["stimulus", "rate"]

order = (1, 0, 0)
seasonal_order = (1, 0, 0, 12)

res_x = sm.tsa.SARIMAX(
    train["y"],
    exog=train[exog_cols],
    order=order,
    seasonal_order=seasonal_order,
    trend="c",
    enforce_stationarity=True,
    enforce_invertibility=True,
).fit(disp=False, method="lbfgs", maxiter=500)

res_0 = sm.tsa.SARIMAX(
    train["y"],
    order=order,
    seasonal_order=seasonal_order,
    trend="c",
    enforce_stationarity=True,
    enforce_invertibility=True,
).fit(disp=False, method="lbfgs", maxiter=500)

fcst_x = res_x.get_forecast(steps=h, exog=test[exog_cols])
fcst_0 = res_0.get_forecast(steps=h)

pred_x = fcst_x.predicted_mean
pred_0 = fcst_0.predicted_mean


def rmse(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    return float(np.sqrt(np.mean((y_true - y_pred) ** 2)))


def mae(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    return float(np.mean(np.abs(y_true - y_pred)))


print("Test RMSE (with exog):", rmse(test["y"], pred_x))
print("Test RMSE (no exog): ", rmse(test["y"], pred_0))
print("Test MAE  (with exog):", mae(test["y"], pred_x))
print("Test MAE  (no exog): ", mae(test["y"], pred_0))

res_x.summary()
Test RMSE (with exog): 1.6851935314621316
Test RMSE (no exog):  1.7018128229455727
Test MAE  (with exog): 1.370597378351971
Test MAE  (no exog):  1.4392971938785968
SARIMAX Results
Dep. Variable: y No. Observations: 156
Model: SARIMAX(1, 0, 0)x(1, 0, 0, 12) Log Likelihood -276.615
Date: Sat, 31 Jan 2026 AIC 565.229
Time: 12:35:10 BIC 583.528
Sample: 01-01-2010 HQIC 572.661
- 12-01-2022
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 18.2550 3.754 4.863 0.000 10.897 25.613
stimulus 7.3388 0.475 15.462 0.000 6.409 8.269
rate -4.3796 0.666 -6.577 0.000 -5.685 -3.074
ar.L1 0.5277 0.067 7.872 0.000 0.396 0.659
ar.S.L12 0.2335 0.107 2.183 0.029 0.024 0.443
sigma2 2.0176 0.229 8.829 0.000 1.570 2.466
Ljung-Box (L1) (Q): 0.33 Jarque-Bera (JB): 2.59
Prob(Q): 0.56 Prob(JB): 0.27
Heteroskedasticity (H): 0.88 Skew: -0.30
Prob(H) (two-sided): 0.65 Kurtosis: 3.21


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
# Plot holdout forecasts
ci_x = fcst_x.conf_int()
lower_x = ci_x.iloc[:, 0]
upper_x = ci_x.iloc[:, 1]

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=train.index,
        y=train["y"],
        name="train y",
        line=dict(color="rgba(0,0,0,0.35)"),
    )
)
fig.add_trace(go.Scatter(x=test.index, y=test["y"], name="test y", line=dict(color="black")))
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=pred_0,
        name="forecast: SARIMA (no exog)",
        line=dict(color="#F28E2B"),
    )
)
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=pred_x,
        name="forecast: SARIMAX (with exog)",
        line=dict(color="#4E79A7"),
    )
)
fig.add_trace(
    go.Scatter(x=test.index, y=lower_x, showlegend=False, line=dict(color="rgba(78,121,167,0.0)"))
)
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=upper_x,
        showlegend=False,
        line=dict(color="rgba(78,121,167,0.0)"),
        fill="tonexty",
        fillcolor="rgba(78,121,167,0.15)",
    )
)

fig.add_vline(x=test.index[0], line_dash="dash", line_color="black")
fig.update_layout(
    title="Forecast comparison (holdout): SARIMA vs SARIMAX",
    xaxis_title="Date",
    yaxis_title="y",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
)
fig.show()

Note on the intercept in ARMA/SARIMA#

In many ARMA/SARIMA parameterizations, the reported intercept is the ARMA constant \(c\), not the unconditional mean \(\mu\).

For an AR(1)×seasonal AR(1) model, the implied mean offset is:

\[ \mu = \frac{c}{(1-\phi_1)(1-\Phi_1)} \]

The chart below uses this implied \(\mu\) when drawing \(\mu + X\hat\beta\).

# --- Plotly chart: estimated impact of exogenous variables ---
params = res_x.params.copy()
beta_hat = params.reindex(exog_cols).astype(float)

intercept = float(params.get("intercept", params.get("const", 0.0)))
phi1 = float(params.get("ar.L1", 0.0))
Phi1 = float(params.get("ar.S.L12", 0.0))
phi_at_1 = (1.0 - phi1) * (1.0 - Phi1)
mu = intercept / phi_at_1 if phi_at_1 != 0.0 else intercept

contrib = df[exog_cols].to_numpy() * beta_hat.to_numpy()  # (n, k) * (k,)
total_contrib = contrib.sum(axis=1)
mean_part = mu + total_contrib

fig = make_subplots(
    rows=2,
    cols=1,
    shared_xaxes=True,
    vertical_spacing=0.07,
    subplot_titles=(
        "Observed y vs estimated exog-driven mean (μ + Xβ)",
        "Estimated contribution of each exogenous variable",
    ),
)

fig.add_trace(go.Scatter(x=df.index, y=df["y"], name="y", line=dict(color="black")), row=1, col=1)
fig.add_trace(
    go.Scatter(x=df.index, y=mean_part, name="μ + Xβ", line=dict(color="#4E79A7")),
    row=1,
    col=1,
)

for j, col in enumerate(exog_cols):
    fig.add_trace(
        go.Scatter(x=df.index, y=contrib[:, j], name=f"{col} contribution"),
        row=2,
        col=1,
    )

fig.add_trace(
    go.Scatter(
        x=df.index,
        y=total_contrib,
        name="total Xβ",
        line=dict(color="rgba(0,0,0,0.6)", dash="dash"),
    ),
    row=2,
    col=1,
)

fig.update_yaxes(title_text="y", row=1, col=1)
fig.update_yaxes(title_text="contribution", row=2, col=1)
fig.update_layout(
    title="Impact of exogenous variables (estimated from SARIMAX fit)",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
)
fig.show()

beta_hat
stimulus    7.338761
rate       -4.379605
dtype: float64
# --- Plotly chart: counterfactual-style forecast scenarios ---
# IMPORTANT: this is a model-based conditional scenario, not necessarily a causal effect.

exog_base = test[exog_cols].copy()
exog_cf = exog_base.copy()
exog_cf["stimulus"] = 0.0

pred_base = res_x.get_forecast(steps=h, exog=exog_base).predicted_mean
pred_cf = res_x.get_forecast(steps=h, exog=exog_cf).predicted_mean

effect = pred_base - pred_cf

fig = go.Figure()
fig.add_trace(
    go.Scatter(x=train.index, y=train["y"], name="train y", line=dict(color="rgba(0,0,0,0.35)"))
)
fig.add_trace(go.Scatter(x=test.index, y=test["y"], name="test y", line=dict(color="black")))
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=pred_base,
        name="baseline forecast (observed exog)",
        line=dict(color="#4E79A7"),
    )
)
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=pred_cf,
        name="counterfactual (stimulus=0)",
        line=dict(color="#E15759", dash="dash"),
    )
)
fig.add_vline(x=test.index[0], line_dash="dash", line_color="black")
fig.update_layout(
    title="Counterfactual-style forecast: set stimulus=0 in the horizon",
    xaxis_title="Date",
    yaxis_title="y",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
)
fig.show()

go.Figure(
    data=[go.Bar(x=test.index, y=effect, name="baseline - counterfactual")],
    layout=go.Layout(
        title="Estimated stimulus effect in the forecast horizon (model-based)",
        xaxis_title="Date",
        yaxis_title="Δy",
    ),
).show()

effect.describe()
count    24.0
mean      0.0
std       0.0
min       0.0
25%       0.0
50%       0.0
75%       0.0
max       0.0
Name: predicted_mean, dtype: float64

5) Low-level NumPy implementation (educational approximation)#

A full SARIMAX fit typically uses maximum likelihood via a state-space model and the Kalman filter (this is what statsmodels does).

To keep the mechanics transparent, the next section implements a simple two-stage approximation:

  1. Fit the regression \(y_t \approx \beta_0 + \mathbf{x}_t^\top\boldsymbol{\beta}\) by OLS.

  2. Fit a seasonal AR model on the residuals:

\[ u_t \approx \varphi_1 u_{t-1} + \Phi_1 u_{t-s} + e_t \]
  1. Forecast by combining the regression mean and the recursive residual forecast.

This corresponds to an ARIMAX-like model with no MA terms and no joint estimation; it is useful for intuition, not as a drop-in replacement for SARIMAX MLE.

def add_intercept(X: np.ndarray) -> np.ndarray:
    return np.column_stack([np.ones((X.shape[0], 1)), X])


def fit_ols(X: np.ndarray, y: np.ndarray) -> np.ndarray:
    beta, *_ = np.linalg.lstsq(X, y, rcond=None)
    return beta


def fit_seasonal_ar(residuals: np.ndarray, lags: list[int]) -> np.ndarray:
    max_lag = max(lags)
    y = residuals[max_lag:]
    X = np.column_stack([residuals[max_lag - lag : -lag] for lag in lags])
    params, *_ = np.linalg.lstsq(X, y, rcond=None)
    return params


def forecast_seasonal_ar(
    residuals_history: np.ndarray,
    ar_params: np.ndarray,
    lags: list[int],
    steps: int,
) -> np.ndarray:
    max_lag = max(lags)
    hist = residuals_history.astype(float).copy()
    if hist.shape[0] < max_lag:
        raise ValueError("Need at least max(lags) residual history values.")

    preds: list[float] = []
    for _ in range(steps):
        t = hist.shape[0]
        r_hat = 0.0
        for coef, lag in zip(ar_params, lags):
            r_hat += float(coef) * float(hist[t - lag])
        preds.append(r_hat)
        hist = np.append(hist, r_hat)

    return np.asarray(preds)


# Stage 1: OLS regression
X_train = add_intercept(train[exog_cols].to_numpy())
y_train = train["y"].to_numpy()

beta_ols = fit_ols(X_train, y_train)
resid_train = y_train - X_train @ beta_ols

# Stage 2: seasonal AR on residuals (lags 1 and 12)
lags = [1, 12]
ar_params = fit_seasonal_ar(resid_train, lags)

# Forecast residuals and combine with regression mean
resid_fcst = forecast_seasonal_ar(resid_train, ar_params, lags, steps=h)

X_test = add_intercept(test[exog_cols].to_numpy())
pred_numpy = X_test @ beta_ols + resid_fcst

print("OLS betas (intercept, stimulus, rate):", beta_ols)
print("Seasonal AR params (lag 1, lag 12):", ar_params)
print("Test RMSE (NumPy two-stage):", rmse(test["y"], pred_numpy))

fig = go.Figure()
fig.add_trace(go.Scatter(x=test.index, y=test["y"], name="test y", line=dict(color="black")))
fig.add_trace(go.Scatter(x=test.index, y=pred_x, name="SARIMAX forecast", line=dict(color="#4E79A7")))
fig.add_trace(
    go.Scatter(
        x=test.index,
        y=pred_numpy,
        name="NumPy two-stage",
        line=dict(color="#59A14F", dash="dash"),
    )
)
fig.update_layout(
    title="Holdout forecast: statsmodels SARIMAX vs NumPy two-stage approximation",
    xaxis_title="Date",
    yaxis_title="y",
)
fig.show()
OLS betas (intercept, stimulus, rate): [51.76806591  7.28392131 -5.13479592]
Seasonal AR params (lag 1, lag 12): [0.51265418 0.20647124]
Test RMSE (NumPy two-stage): 1.6113110846360617

Pitfalls + diagnostics#

  • Leakage (alignment errors): if \(x_t\) includes information from the future (even subtly via rolling features), you will overestimate performance.

  • Endogeneity: in many finance/econ settings, \(x\) and \(y\) influence each other. SARIMAX can forecast, but coefficients can be misleading as “effects”.

  • Collinearity: correlated regressors inflate variance in \(\hat{\beta}\); interpret with caution.

  • Forecasting exog: SARIMAX needs future \(\mathbf{x}_{t+h}\); if you can’t know it, you must model it or do scenario ranges.

  • Residual checks: inspect residual ACF/PACF and run stationarity tests on residuals; remaining structure suggests mis-specified orders or missing regressors.

  • Order selection: start simple (small p/q/P/Q) and validate on rolling-origin backtests.

Exercises + references#

Exercises:

  1. Add a lagged version of rate (e.g., rate.shift(1)) and see how coefficients and forecasts change.

  2. Introduce a trend in rate and compare: include a trend term vs differencing.

  3. Replace the binary stimulus with a continuous “spend” variable and examine scaling/standardization.

  4. Do a rolling backtest and compare SARIMA vs SARIMAX stability over time.

References:

  • statsmodels.tsa.statespace.SARIMAX documentation (state-space + Kalman filter implementation).

  • Box, Jenkins, Reinsel & Ljung: Time Series Analysis: Forecasting and Control (classic ARIMA/SARIMA reference).